Lewis and Clark County
Unsupervised decoding of encoded reasoning using language model interpretability
As large language models become increasingly capable, there is growing concern that they may develop reasoning processes that are encoded or hidden from human oversight. To investigate whether current interpretability techniques can penetrate such encoded reasoning, we construct a controlled testbed by fine-tuning a reasoning model (DeepSeek-R1-Distill-Llama-70B) to perform chain-of-thought reasoning in ROT-13 encryption while maintaining intelligible English outputs. We evaluate mechanistic interpretability methods--in particular, logit lens analysis--on their ability to decode the model's hidden reasoning process using only internal activations. We show that logit lens can effectively translate encoded reasoning, with accuracy peaking in intermediate-to-late layers. Finally, we develop a fully unsupervised decoding pipeline that combines logit lens with automated paraphrasing, achieving substantial accuracy in reconstructing complete reasoning transcripts from internal model representations. These findings suggest that current mechanistic interpretability techniques may be more robust to simple forms of encoded reasoning than previously understood. Our work provides an initial framework for evaluating interpretability methods against models that reason in non-human-readable formats, contributing to the broader challenge of maintaining oversight over increasingly capable AI systems.
- North America > United States > Illinois > Sangamon County > Springfield (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.07)
- North America > United States > California > Sacramento County > Sacramento (0.05)
- (22 more...)
Model Directions, Not Words: Mechanistic Topic Models Using Sparse Autoencoders
Zheng, Carolina, Beltran-Velez, Nicolas, Karlekar, Sweta, Shi, Claudia, Nazaret, Achille, Mallik, Asif, Feder, Amir, Blei, David M.
Traditional topic models are effective at uncovering latent themes in large text collections. However, due to their reliance on bag-of-words representations, they struggle to capture semantically abstract features. While some neural variants use richer representations, they are similarly constrained by expressing topics as word lists, which limits their ability to articulate complex topics. We introduce Mechanistic Topic Models (MTMs), a class of topic models that operate on interpretable features learned by sparse autoencoders (SAEs). By defining topics over this semantically rich space, MTMs can reveal deeper conceptual themes with expressive feature descriptions. Moreover, uniquely among topic models, MTMs enable controllable text generation using topic-based steering vectors. To properly evaluate MTM topics against word-list-based approaches, we propose \textit{topic judge}, an LLM-based pairwise comparison evaluation framework. Across five datasets, MTMs match or exceed traditional and neural baselines on coherence metrics, are consistently preferred by topic judge, and enable effective steering of LLM outputs.
- Europe > Russia (0.14)
- Asia > Russia (0.14)
- Asia > Middle East > Jordan (0.04)
- (10 more...)
- Leisure & Entertainment > Sports (1.00)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- (2 more...)
WavePulse: Real-time Content Analytics of Radio Livestreams
Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay
Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > New York > Kings County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (215 more...)
- Media > Radio (1.00)
- Leisure & Entertainment (1.00)
- Government > Voting & Elections (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors
Yun, Zeyu, Chen, Yubei, Olshausen, Bruno A, LeCun, Yann
Transformer networks have revolutionized NLP representation learning since they were introduced. Though a great effort has been made to explain the representation in transformers, it is widely recognized that our understanding is not sufficient. One important reason is that there lack enough visualization tools for detailed analysis. In this paper, we propose to use dictionary learning to open up these "black boxes" as linear superpositions of transformer factors. Through visualization, we demonstrate the hierarchical semantic structures captured by the transformer factors, e.g., word-level polysemy disambiguation, sentence-level pattern formation, and long-range dependency. While some of these patterns confirm the conventional prior linguistic knowledge, the rest are relatively unexpected, which may provide new insights. We hope this visualization tool can bring further knowledge and a better understanding of how transformer networks work. The code is available at https://github.com/zeyuyun1/TransformerVis
- Europe > Jersey (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Oceania > New Zealand (0.14)
- (63 more...)
- Personal > Obituary (0.92)
- Research Report > New Finding (0.67)
- Transportation > Ground (1.00)
- Transportation > Air (1.00)
- Media > Music (1.00)
- (13 more...)
Single Index Latent Variable Models for Network Topology Inference
Mei, Jonathan, Moura, José M. F.
A semi-parametric, non-linear regression model in the presence of latent variables is applied towards learning network graph structure. These latent variables can correspond to unmodeled phenomena or unmeasured agents in a complex system of interacting entities. This formulation jointly estimates non-linearities in the underlying data generation, the direct interactions between measured entities, and the indirect effects of unmeasured processes on the observed data. The learning is posed as regularized empirical risk minimization. Details of the algorithm for learning the model are outlined. Experiments demonstrate the performance of the learned model on real data.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Rocky Mountains (0.04)
- North America > United States > North Dakota > Cass County > Fargo (0.04)
- (3 more...)
SILVar: Single Index Latent Variable Models
Mei, Jonathan, Moura, Jose' M. F.
How real is this relationship? This is a ubiquitous question that presents itself not only in judging interpersonal connections but also in evaluating correlations and causality throughout science and engineering. Two reasons for reaching incorrect conclusions based on observed relationships in collected data are chance and outside influences. For example, we can flip two coins that both show heads, or observe that today's temperature measurements on the west coast of the continental USA seem to correlate with tomorrow's on the east coast throughout the year. In the first case, we might not immediately conclude that coins are related, since the number of flips we observe is not very large relative to the possible variance of the process, and the apparent link we observed is up to chance. In the second case, we still may hesitate to use west coast weather to understand and predict east coast weather, since in reality both are closely following a seasonal trend. Establishing interpretable relationships between entities while mitigating the effects of chance can be achieved via sparse optimization methods, such as regression (Lasso) [1] and inverse covariance estimation [2]. In addition, the extension to time series via vector autoregression [3], [4] yields interpretations related to Granger causality [5]. In each of these settings, estimated nonzero values correspond to actual relations, while zeros correspond to absence of relations.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (9 more...)
Flipboard on Flipboard
There are more than 8,000 online courses out there. These are some of the best. More than 50 million students signed up for one this year. When scientists announce they've made a breakthrough, they usually promise we'll see the full effects of those discoveries--anything from a better understanding of how the universe works to a drug ready for use in patients--in about five years. TULSA -- Tom Coomer has retired twice: once when he was 65, and then several years ago.
- North America > United States > New York > Bronx County > New York City (0.05)
- North America > United States > Montana > Lewis and Clark County > Helena (0.05)
- Europe > Italy (0.05)
- Africa > Liberia (0.05)
- Government (1.00)
- Education > Educational Setting > Online (0.79)
- Media (0.72)
Top 5 misconceptions about data science PACKT Books
Data science is a well-defined, serious field of study and work. But the term'data science' has become a bit of a buzzword. Yes, 'data scientists' have become increasingly important to many different types of organizations, but it has also become a trend term in tech recruitment. The fact that these words are thrown around so casually has led to a lot of confusion about what data science and data scientists actually is and are. I would formerly include myself in this group.
- Information Technology > Data Science (1.00)
- Information Technology > Communications > Social Media (0.50)
- Information Technology > Artificial Intelligence > Machine Learning (0.31)